Using supplier locality in power-aware interconnects and caches in chip multiprocessors

نویسندگان

  • Ehsan Atoofian
  • Amirali Baniasadi
چکیده

Conventional snoopy-based chip multiprocessors take an aggressive approach broadcasting snoop requests to all nodes. In addition each node checks all received requests. This approach reduces the latency of cache to cache transfer misses at the expense of increasing power. In this paper we show that a large portion of interconnect/cache transactions are redundant as many snoop requests miss in the remote nodes. We exploit this inefficiency and introduce power optimization techniques for chip multiprocessors. Our optimizations rely on the observation that in a snoopy-based shared memory system the data supplier can be predicted with high accuracy. Our optimizations reduce power by eliminating unnecessary activity at both the requester and the supplier end of snoop requests. We reduce power as we (a) avoid broadcasting snoop requests to all processors and (b) avoid tag lookup for all nodes and for all requests arriving. In particular, we use supplier locality and introduce the following two optimizations. First, and at the requester end, we introduce speculative selective request (SSR) to reduce power dissipation in the binary tree interconnect. In SSR, we send the request only to the node more likely to have the missing data. We reduce power as we limit access only to the interconnect components between the requestor and the supplier node. Second, and at the supplier end, we propose speculative tag lookup (STL) to reduce power dissipation in data caches. We filter those accesses more likely to miss in the L1 cache. Using shared memory applications, we show that by limiting snoop requests to the speculated nodes we reduce interconnect power by 25% in a four-way multiprocessor. Moreover, we show that speculative tag lookup reduces power in tag arrays by 14.1% in a four-way multiprocessor. Both optimizations come with negligible performance loss and hardware overhead. 2007 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A fine-grained thread-aware management policy for shared caches

Two of the main sources of inefficiency in current caches are the non-uniform distribution of the memory accesses across the cache sets, which causes misses due to the mapping restrictions of non fully-associative caches, and the access patterns with little locality that degrade the performance of caches under the traditional LRU replacement policy. This paper proposes a technique to tackle in ...

متن کامل

Hybrid Shared-aware Cache Coherence Transition Strategy

Chip-multiprocessors have played a significant role in real parallel computer architecture design. For integrating tens of cores into a chip, designs tend towards with physically distributed last level caches. This naturally results in a Non-Uniform Cache Access design, where on-chip access latencies depend on the physical distances between requesting cores and home cores where the data is cach...

متن کامل

Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors

Current technology continues providing smaller and faster transistors, so processor architects can offer more complex and functional ILP processors, because manufacturers can fit more transistors on the same chip area. As a consequence, the fraction of chip area reachable in a single clock cycle is dropping, and at the same time the number of transistors on the chip is increasing. However, prob...

متن کامل

Distance Analysis for Large - Scale Chip Multiprocessors

Title of dissertation: REUSE DISTANCE ANALYSIS FOR LARGE-SCALE CHIP MULTIPROCESSORS Meng-Ju Wu, Doctor of Philosophy, 2012 Dissertation directed by: Professor Donald Yeung Department of Electrical and Computer Engineering Multicore Reuse Distance (RD) analysis is a powerful tool that can potentially provide a parallel program’s detailed memory behavior. Concurrent Reuse Distance (CRD) and Priva...

متن کامل

Cost-aware Topology Customization of Mesh-based Networks-on-Chip

Nowadays, the growing demand for supporting multiple applications causes to use multiple IPs onto the chip. In fact, finding truly scalable communication architecture will be a critical concern. To this end, the Networks-on-Chip (NoC) paradigm has emerged as a promising solution to on-chip communication challenges within the silicon-based electronics. Many of today’s NoC architectures are based...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Systems Architecture - Embedded Systems Design

دوره 54  شماره 

صفحات  -

تاریخ انتشار 2008